Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent primary hpa collision for keda scaled objects when migrating from an hpa #1677

Merged
merged 1 commit into from
Feb 11, 2025

Conversation

jdgeisler
Copy link
Contributor

This MR fixes #1646 where Flagger fails to create the primary hpa for a keda scaled object when you are migrating from a regular hpa to a scaled object in a canary.

Using the scaledobject.keda.sh/transfer-hpa-ownership: "true" annotation, a keda scaled object can take ownership of an already existing HPA. Currently, annotations are not copied over from the canary scaled object to the primary scaled object, so I am adding this functionality.

Additionally, we also need to add the Advanced.HorizontalPodAutoscalerConfig.Name field to the ScaledObjectSpec so that we can specify the HPA name that the scaled object will take ownership of. When creating the primary scaled object from the canary, we use the same hpa name but append -primary to the end of it.

Testing

With the above changes, we can then successfully use the transfer hpa ownership annotation and specify the HPA that the scaled objects will take ownership of. This prevents KEDA from failing to create the new primary hpa since it already exists. Instead it just uses and modifies the already existing one.

Applying the following diff to test in a kubernetes cluster:

fortio, fortio-server-deployment-2, Canary (flagger.app) has changed:
  # Source: fortio/templates/fortio.yaml
  apiVersion: flagger.app/v1beta1
  kind: Canary
  metadata:
    name: fortio-server-deployment-2
    namespace: fortio
  spec:
    analysis:
      interval: 30s
      maxWeight: 50
      metrics:
      - interval: 30s
        name: request-success-rate
        thresholdRange:
          min: 99
      stepWeight: 10
      threshold: 10
      webhooks:
      - metadata:
          async: "on"
          c: "6"
          labels: fortio-server-deployment-2
          load: Start
          nocatchup: "on"
          qps: "12"
          save: "on"
          stdclient: "on"
          t: 600s
          uniform: "on"
          url: http://fortio-server-deployment-2-canary.fortio.svc.cluster.local:8080/
        name: generateLoad
        timeout: 60s
        type: pre-rollout
        url: http://fortio-client.fortio.svc.cluster.local:8080/fortio/rest/run?jsonPath=.metadata
    progressDeadlineSeconds: 600
    autoscalerRef:
-     apiVersion: autoscaling/v2
-     kind: HorizontalPodAutoscaler
+     apiVersion: keda.sh/v1alpha1
+     kind: ScaledObject
      name: fortio-server-deployment-2
    targetRef:
      apiVersion: apps/v1
      kind: Deployment
      name: fortio-server-deployment-2
    service:
      port: 8080
      targetPort: 8080
fortio, fortio-server-deployment-2, HorizontalPodAutoscaler (autoscaling) has been removed:
- # Source: fortio/templates/fortio.yaml
- apiVersion: autoscaling/v2
- kind: HorizontalPodAutoscaler
- metadata:
-   name: fortio-server-deployment-2
-   namespace: fortio
- spec:
-   maxReplicas: 4
-   metrics:
-   - resource:
-       name: cpu
-       target:
-         averageUtilization: 99
-         type: Utilization
-     type: Resource
-   minReplicas: 2
-   scaleTargetRef:
-     apiVersion: apps/v1
-     kind: Deployment
-     name: fortio-server-deployment-2
+
fortio, fortio-server-deployment-2, ScaledObject (keda.sh) has been added:
-
+ # Source: fortio/templates/fortio.yaml
+ apiVersion: keda.sh/v1alpha1
+ kind: ScaledObject
+ metadata:
+   name: fortio-server-deployment-2
+   namespace: fortio
+   annotations:
+     scaledobject.keda.sh/transfer-hpa-ownership: "true"
+ spec:
+   advanced:
+     horizontalPodAutoscalerConfig:
+       name: fortio-server-deployment-2
+   scaleTargetRef:
+     name: fortio-server-deployment-2
+   minReplicaCount: 2
+   maxReplicaCount: 4
+   triggers:
+   - type: prometheus
+     metadata:
+       serverAddress: http://kube-prometheus-stack-prometheus:9090/prometheus
+       threshold: '100'
+       query: sum(rate(istio_requests_total{destination_service_name="fortio-server-2"}[1m]))
+   - type: cpu
+     metricType: Utilization
+     metadata:
+       value: "99"

See that the primary scaled object successfully takes over ownership of the primary hpa and it no longer fails.

(⎈|arn:aws:eks:us-east-2:824163854805:cluster/rokumesh-auto-jgeisler-t7:fortio)jgeisler@PK4V290RH9 platform % k get so
NAME                                 SCALETARGETKIND      SCALETARGETNAME                      MIN   MAX   TRIGGERS     AUTHENTICATION   READY   ACTIVE    FALLBACK   PAUSED    AGE
fortio-server-deployment-2           apps/v1.Deployment   fortio-server-deployment-2           2     4     prometheus                    True    Unknown   False      True      170m
fortio-server-deployment-2-primary   apps/v1.Deployment   fortio-server-deployment-2-primary   2     4     prometheus                    True    False     False      Unknown   170m
(⎈|arn:aws:eks:us-east-2:824163854805:cluster/rokumesh-auto-jgeisler-t7:fortio)jgeisler@PK4V290RH9 platform % k get hpa
NAME                                 REFERENCE                                       TARGETS               MINPODS   MAXPODS   REPLICAS   AGE
fortio-server-deployment-2-primary   Deployment/fortio-server-deployment-2-primary   0/100 (avg), 4%/99%   2         4         2          13d

@jdgeisler jdgeisler requested a review from stefanprodan as a code owner July 10, 2024 15:16
@jdgeisler jdgeisler force-pushed the keda-scaled-object-hpa-migration branch 2 times, most recently from b446861 to c8859f4 Compare July 10, 2024 15:26
@jdgeisler jdgeisler force-pushed the keda-scaled-object-hpa-migration branch from c8859f4 to 5965983 Compare August 13, 2024 14:16
@jdgeisler
Copy link
Contributor Author

Hey @stefanprodan and @aryan9600, looking for some feedback here on this PR and/or the issue it fixes #1646

Copy link
Member

@aryan9600 aryan9600 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for raising this PR! i did a quick first pass. will review it once more in the coming days.

@MohammedShetaya
Copy link

Thanks for the fix @jdgeisler

@jdgeisler jdgeisler requested a review from aryan9600 October 16, 2024 14:11
@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2024

Codecov Report

Attention: Patch coverage is 80.00000% with 6 lines in your changes missing coverage. Please review.

Project coverage is 39.42%. Comparing base (03d4acc) to head (21acd7e).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
pkg/canary/scaled_object_reconciler.go 80.00% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1677      +/-   ##
==========================================
+ Coverage   39.30%   39.42%   +0.11%     
==========================================
  Files         284      284              
  Lines       22391    22422      +31     
==========================================
+ Hits         8801     8840      +39     
+ Misses      12643    12632      -11     
- Partials      947      950       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@aryan9600 aryan9600 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! 🎖️
could you squash the two commits into one with an appropriate message and then rebase on top of main?

@jdgeisler jdgeisler force-pushed the keda-scaled-object-hpa-migration branch 2 times, most recently from 7104b35 to 4aae393 Compare October 28, 2024 14:27
@jdgeisler
Copy link
Contributor Author

lgtm! 🎖️ could you squash the two commits into one with an appropriate message and then rebase on top of main?

Done, thanks!

@jdgeisler
Copy link
Contributor Author

Is this ready to be merged now? @aryan9600

Thanks!

@jdgeisler jdgeisler force-pushed the keda-scaled-object-hpa-migration branch from 091c56b to 4aae393 Compare February 10, 2025 16:15
@jdgeisler jdgeisler force-pushed the keda-scaled-object-hpa-migration branch from 4aae393 to 21acd7e Compare February 10, 2025 16:25
@aryan9600 aryan9600 merged commit 660ed74 into fluxcd:main Feb 11, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KEDA Scaled Object Primary creation failed due to workload already managed by the hpa
4 participants